Raster Data Processing

Advanced Geospatial Data Processing for Social Scientists

Dennis Abel & Stefan Jünger

2025-04-28

Day Time Title
April 28 10:00-11:15 Introduction
April 28 11:15-11:30 Coffee Break
April 28 11:30-13:00 Raster data in R
April 28 13:00-14:00 Lunch Break
April 28 14:00-15:15 Raster data processing
April 28 15:15-15:30 Coffee Break
April 28 15:30-17:00 Graphical display of raster data in maps
April 29 10:00-11:15 Datacube processing I
April 29 11:15-11:30 Coffee Break
April 29 11:30-13:00 Datacube processing II & API access
April 29 13:00-14:00 Lunch Break
April 29 14:00-15:15 Data integration and linking (with survey data)
April 29 15:15-15:30 Coffee Break
April 29 15:30-17:00 Outlook and open session with own application

Our plan

Thus far, we have learned about

  • Different data formats
  • How to load them
  • First steps in interacting with them

In this session, you will learn

  • How to wrangle raster data even further
  • Linking to vector data
  • Manipulating the raster values
  • How to converse from one format into the other

Subsetting

Cropping raster data

Cropping is a method of cutting out a specific ‘slice’ of a raster layer based on an input dataset or geospatial extent, such as a bounding box. We often do this to ‘zoom’ in on a dataset or to make our computations more efficient. Let’s pretend, we are mainly interested in the small-scale population size of Eastern Germany. For this purpose, we can use the bounding box of Eastern Germany.

Bounding box in R

The easiest way (in my opinion) to create a bounding box in R is to use the sf::st_bbox() function, possibly based on another geospatial dataset (and not ChatGPT).

bbox_east <- 
  sf::st_read("./data/VG250_LAN.shp") |> 
  dplyr::filter(SN_L %in% 11:16, GF == 4) |> 
  sf::st_transform(25832) |> 
  sf::st_bbox() |> 
  sf::st_as_sfc(crs = 25832)

bbox_east
Reading layer `VG250_LAN' from data source 
  `C:\Users\mueller2\a_talks_presentations\advanced_geospatial\data\VG250_LAN.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 35 features and 23 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 280371.1 ymin: 5235856 xmax: 921292.4 ymax: 6106244
Projected CRS: ETRS89 / UTM zone 32N
Geometry set for 1 feature 
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 562009.3 ymin: 5562814 xmax: 921292.4 ymax: 6068746
Projected CRS: ETRS89 / UTM zone 32N

Cropping the Census data on population numbers

Now, cropping is easy using the terra::crop() function.

pop_grid_2022 <- terra::rast("./data/z22/population.tif")

pop_grid_2022_crop <- terra::crop(pop_grid_2022, bbox_east)

terra::ext(pop_grid_ger_2020)
terra::ext(pop_grid_ger_2020_crop)
SpatExtent : 274144.445273998, 926449.115580001, 5236495.3402849, 6104900.94431191 (xmin, xmax, ymin, ymax)
SpatExtent : 562279.023568674, 921446.779151274, 5562647.6754379, 6068884.12202507 (xmin, xmax, ymin, ymax)

Note the different spatial extents.

Plotting the different extents

terra::plot(pop_grid_2022)

terra::plot(pop_grid_2022_crop)

Wait a minute…

…does these data solely comprise Eastern German states? Let’s have a look at their shapes.

east_german_states <- 
  sf::st_read(
    "./data/VG250_LAN.shp",
    quiet = TRUE
  ) |> 
  dplyr::filter(SN_L %in% 11:16, GF == 4)

plot(pop_grid_ger_2020_crop)
plot(
  sf::st_geometry(east_german_states), 
  border = "white", 
  col = scales::alpha("white", .2), 
  lwd = 4, 
  add = TRUE
)

First solution: terra::mask()

Masking is similar to cropping, yet values outside the extent are set to missing values (NA).

pop_grid_2022_mask <- 
  terra::mask(
    pop_grid_2022, 
    terra::vect(east_german_states) # !
  )

plot(pop_grid_2022_mask)

Second and best solution: combining masking and cropping

pop_grid_2022_crop <- 
  terra::mask(
    pop_grid_2022, 
    terra::vect(east_german_states) # !
  ) |> 
  terra::crop(east_german_states)

plot(pop_grid_2022_crop)

Extraction & Aggregation

Changes in terminology

If we only want to add one attribute from a vector dataset Y to another vector dataset X, we can conduct a spatial join using sf::st_join() as shown earlier. In the raster data world, these operations are called raster extractions.

Raster data are helpful when we aim to

  • Apply calculations that are the same for all geometries in the dataset
  • Extract information from the raster fast and efficient

Pulling in some data

For this effort, we re-import the synthetic survey geocoordinates. We sample 100 geocoordinates from the whole dataset, as we only need a few for demonstrating the procedures that are following.

points <- 
  readRDS(
    "./data/synthetic_survey_coordinates.rds"
  ) |> 
  dplyr::sample_n(size = 100) |> 
  sf::st_transform(25832)

points
Simple feature collection with 100 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 314498.3 ymin: 5281110 xmax: 868183.5 ymax: 6009895
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 100 × 3
      id           geometry foreigner
 * <int>        <POINT [m]>     <int>
 1   980 (384699.7 5724497)         1
 2  1908 (483969.2 5331922)         0
 3  1772 (619176.2 5461729)         0
 4  1391 (473710.7 5939657)         1
 5  1891 (645982.7 5626083)         0
 6  1072 (340190.8 5468907)         0
 7  1565 (376655.7 5581417)         0
 8  1356   (836603 5661627)         0
 9  1994 (767232.9 5609682)         0
10   343 (410176.9 5468902)         0
# ℹ 90 more rows

Raster extraction

To extract the raster values at a specific point by location, we use the following:

terra::extract(pop_grid_2022, points, ID = FALSE) |> 
  head(10)
       cat_0
1  2655.0781
2   392.9463
3   160.6061
4  2074.0515
5   985.0100
6  2113.3601
7  1367.2769
8  6068.7798
9   807.7233
10  255.6932

Add results to existing dataset

This information can be added to an existing dataset (our points in this example):

points$pop <- 
  terra::extract(pop_grid_2022, points, ID = FALSE)[[1]]

points
Simple feature collection with 100 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 314498.3 ymin: 5281110 xmax: 868183.5 ymax: 6009895
Projected CRS: ETRS89 / UTM zone 32N
# A tibble: 100 × 4
      id           geometry foreigner   pop
 * <int>        <POINT [m]>     <int> <dbl>
 1   980 (384699.7 5724497)         1 2655.
 2  1908 (483969.2 5331922)         0  393.
 3  1772 (619176.2 5461729)         0  161.
 4  1391 (473710.7 5939657)         1 2074.
 5  1891 (645982.7 5626083)         0  985.
 6  1072 (340190.8 5468907)         0 2113.
 7  1565 (376655.7 5581417)         0 1367.
 8  1356   (836603 5661627)         0 6069.
 9  1994 (767232.9 5609682)         0  808.
10   343 (410176.9 5468902)         0  256.
# ℹ 90 more rows

More elaborated: spatial buffers

Sometimes, extracting information 1:1 is not enough.

  • It’s too narrow
  • There is missing information about the surroundings of a point

Buffer extraction

We can use spatial buffers of different sizes to extract information on surroundings:

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  1263.5822
2   513.2711
3   345.3904
4  1490.8691
5   467.4778
6   738.7713
7   634.7842
8  3220.1300
9   598.0971
10  260.1378
terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 5000),
  fun = mean,
  na.rm = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1   568.4251
2   369.1342
3   430.0407
4   999.8300
5   209.4865
6   873.8573
7   460.5101
8  2692.1003
9   572.5220
10  215.4286

Extraction toggles: touches

There’s a multitude of arguments that we can adjust to conduct the extraction. An important option is on from which raster cells we want our extraction to be done. Here’s an example how the argument terra::extract(..., touches = FALSE/TRUE) works.

touches = FALSE vs. touches = TRUE

Let’s see how the values differ when we apply the option or not.

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = mean,
  na.rm = TRUE,
  touches = FALSE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  1263.5822
2   513.2711
3   345.3904
4  1490.8691
5   467.4778
6   738.7713
7   634.7842
8  3220.1300
9   598.0971
10  260.1378
terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = mean,
  na.rm = TRUE,
  touches = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1   939.6127
2   483.2634
3   341.2769
4  1457.0143
5   401.7546
6   722.7065
7   567.5027
8  3177.0263
9   623.3190
10  211.7880

Extraction function

Often we default on the mean of raster cell values to be extracted. As we deal with population counts in our example, maybe calculating the sum is more relevant. Or the maximum?

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = sum,
  na.rm = TRUE,
  touches = FALSE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  26535.226
2   6159.253
3   5871.636
4  26835.644
5   5142.256
6  12559.112
7  12695.684
8  61182.470
9   9569.554
10  3121.654
terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = max,
  na.rm = TRUE,
  touches = TRUE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
       cat_0
1  3457.1177
2  1292.5817
3  1154.7985
4  3767.6599
5  2032.4690
6  2752.8694
7  2339.7739
8  6575.6094
9  1308.7130
10  596.8067

Custom functions

We can even define custom functions.

terra::extract(
  pop_grid_2022, 
  sf::st_buffer(points, 2500),
  fun = function (x) {max(x, na.rm = TRUE) / sum(x, na.rm = TRUE)},
  touches = FALSE,
  ID = FALSE,
  raw = TRUE
) |> 
  head(10)
        cat_0
1  0.13028409
2  0.20986012
3  0.19667404
4  0.14039759
5  0.39524849
6  0.17946215
7  0.18429679
8  0.09919148
9  0.13675800
10 0.19118286

Raster aggregation

We can use the same procedure to aggregate a raster dataset into a vector polygon dataset. That’s a widespread use case. Let’s load our German districts vector dataset in ./data/. And this time, we use raster data on gender ratios from 2020.

german_districts <- 
  sf::st_read(
    "./data/VG250_KRS.shp",
    quiet = TRUE
  ) |> 
  sf::st_transform(3035)

gender_ratios_2020 <-
  terra::rast("./data/gender_ratio_2020.tif")

plot(gender_ratios_2020)
plot(
  sf::st_geometry(german_districts), 
  border = "white", 
  col = NA, 
  lwd = .5, 
  add = TRUE
)

Adding the aggregated data

Again, we simply use terra::extract() to create the aggregated data, which we then can add to our vector dataset.

german_districts$gender_ratios_2020 <-
  terra::extract(
    gender_ratios_2020, 
    german_districts, 
    fun = mean, 
    na.rm = TRUE, 
    ID = FALSE
  ) |> 
  unlist()

plot(german_districts["gender_ratios_2020"])

Manipulating the raster data

Digging deeper

The previous steps are more or less efforts than work on the raw raster cell values data. However, there are occasions where we might want to work on the raster data themselves or at least pre-process them to add them to another dataset (e.g., a vector file). Some of these procedures are for visualization, such as heatmaps, others are necessary for our later analyses. We will show a few in the following.

Creating a quick ‘heatmap’

Population counts for the whole of Germany helps us to identify urban and rural clusters. But there may be applications where we need to zoom in on a specific city to identify within-city variations, also regarding foreinger shares. Let’s do that for the German capital Berlin. For this purpose, we subset the district data and also mask and crop data on foreigner shares from the German census.

berlin <-
  german_districts |> 
  dplyr::filter(AGS == "11000")

foreigners_berlin <-
  terra::rast("./data/z22/foreigners.tif") |> 
  terra::project("EPSG:3035") |> 
  terra::mask(terra::vect(berlin)) |> 
  terra::crop(berlin)

plot(foreigners_berlin)

It’s easy

Although, we already can identify high density areas using the raw data, some smoothing may be helpful. We can use the terra::focal() function to do that. It applies a moving window filter on all raster cells of a grid. We’ll have a more detailed look at this function in a minute.

foreigners_berlin_smooth <- 
  terra::focal(
    foreigners_berlin, 
    w = matrix(1, 5, 5), 
    fun = mean, 
    na.rm = TRUE
  )

plot(foreigners_berlin_smooth)

Real point pattern analysis

Usually, when we talk about heat maps we mean analyzing point patterns and whether they spatially cluster. terra might not actually be your best choice when it comes to more elaborated techniques to estimate density kernels and distance base bandwidths between points to draw clusters. For this purpose, packages such as spatstat are well more suited but they require learning about other data structures.[^https://r-spatial.org/book/11-PointPattern.html] That said, densities are more advanced ways of counting things in raster grid cells, and we can mimic this behavior also with terra. So let’s stick to this package and crop our points to the extent of Berlin.

points_berlin <- 
  readRDS(
    "../../data/synthetic_survey_coordinates.rds"
  ) |>  
  sf::st_crop(berlin)

plot(sf::st_geometry(points_berlin))

Creating a raster density template

Next, for our density estimation we simply want to count points with the attribute foreigner = 1 in raster grid cells. However, our point are unfortunately a bit sparse compared to our comprehensive raster dataset. We cannot rely on 1 km² grid cells initially as with the foreigner grid data. But 5 km² may be a good compromise. Let’s do that!

raster_template <-
  points_berlin |> 
  dplyr::filter(foreigner == 1) |> 
  sf::st_bbox() |> 
  terra::ext() |> 
  terra::rast(resolution = 5000, crs = "EPSG:3035")

raster_template
class       : SpatRaster 
dimensions  : 7, 7, 1  (nrow, ncol, nlyr)
resolution  : 4857.143, 4857.143  (x, y)
extent      : 4534500, 4568500, 3255500, 3289500  (xmin, xmax, ymin, ymax)
coord. ref. : ETRS89-extended / LAEA Europe (EPSG:3035) 

Why the hassle?

You may wonder why we are doing that. The answer is simple: We want to count the number of points in each of the grid cells. We use the function terra::rasterize() as a simple technique.

points_berlin_density <- 
  terra::rasterize(
    points_berlin, 
    raster_template, 
    fun = "sum", 
    background = 0
  ) 

plot(points_berlin_density)

Again, why the hassle?

Now, you may think this is disappointing. These data looks… not good. But fear not, working with raster data is powerful, as we now use a function you already now for projecting one CRS into another: terra::project(). This function can also be used to aggregate and disaggregate data based on the structure of another dataset. So, while we were not able to use 1 km² grid cells for our density ‘estimation’ initially, we can reproject our 5 km² onto a 1 km² grid like our foreigners grid. A bit of masking also helps in getting rid of cells that are not within the border of Berlin

points_berlin_density <-
  points_berlin_density |> 
  terra::project(foreigners_berlin) |> 
  terra::mask(foreigners_berlin)

plot(points_berlin_density)

Smoothing

We can also apply smoothing as with the population grid data.

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 5, 5), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  points_berlin_density, 
  w = matrix(1, 5, 5), 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

Playing with the smoothing

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 3, 3), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  points_berlin_density, 
  w = matrix(1, 3, 3), 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

Playing with the smoothing

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 9, 9), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  points_berlin_density, 
  w = matrix(1, 9, 9), 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

What is this argument w?

It’s magic. Just kidding, but it is indeed really powerful. And it makes weird stuff, particularly with our point based densities. It builds around the idea of connecting the value of a focal grid cell to the values of surrounding grid cells, as in the figure below. Hence, the name of the function terra::focal() where the argument is used

It’s just a simple base R matrix

When using this matrix as input and applying the statistic fun = mean, we simply change the value of the focal grid cell to the mean of the values of itself and the 8 surrounding grid cells. That’s what we did before in one example.

matrix(1, 5, 5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    1    1    1    1
[2,]    1    1    1    1    1
[3,]    1    1    1    1    1
[4,]    1    1    1    1    1
[5,]    1    1    1    1    1

But we can change that however we want:

weighted_w <- matrix(c(rep(.5, 12), 1, rep(.5, 12)), 5, 5)

weighted_w
     [,1] [,2] [,3] [,4] [,5]
[1,]  0.5  0.5  0.5  0.5  0.5
[2,]  0.5  0.5  0.5  0.5  0.5
[3,]  0.5  0.5  1.0  0.5  0.5
[4,]  0.5  0.5  0.5  0.5  0.5
[5,]  0.5  0.5  0.5  0.5  0.5

Applying it to the foreigners grid data

terra::focal(
  foreigners_berlin, 
  w = matrix(1, 5, 5), 
  fun = mean, 
  na.rm = TRUE
) |> 
  plot()

focal(
  foreigners_berlin, 
  w = weighted_w, 
  fun = mean,
  na.rm = TRUE
) |> 
  plot()

Real life example: Edges of immigrant rates

In ethnic diversity research, it is a relevant question as to whether sudden changes in the neighborhood composition may rise the potential of conflicts between groups. Researchers use edge detection algorithms from image processing to investigate such changes spatially.

What is edge detection?

Edge detection identifies sudden changes in colors in an image to ‘draw’ borders of things that are in a picture. Here’s an example of the prominent Sobel filter.

Source: https://en.wikipedia.org/wiki/Sobel_operator

We can do that as well using a Sobel filter

R is good for math, right? While this is the formula for applying the Sobel filter to a raster image…

\[r_x = \begin{bmatrix}1 & 0 & -1 \\2 & 0 & -2 \\1 & 0 & -1\end{bmatrix} \times raster\_file \\r_y = \begin{bmatrix}1 & 2 & 1 \\0 & 0 & 0 \\-1 & -2 & -1\end{bmatrix}\times raster\_file \\r_{xy} = \sqrt{r_{x}^2 + r_{y}^2}\]

Implementation in R

…we can easily translate it to be used in terra::focal()1:

sobel <- function(r) {
  fy <- matrix(c(1, 0, -1, 2, 0, -2, 1, 0, -1), nrow = 3)
  fx <- matrix(c(-1, -2, -1, 0, 0, 0, 1, 2, 1) , nrow = 3)
  rx <- terra::focal(r, fx)
  ry <- terra::focal(r, fy)
  sqrt(rx^2 + ry^2)
}

foreigners_berlin_edges <- sobel(foreigners_berlin_smooth)

Comparison

We can now clearly display the edges of sudden changes in population sizes within the city of Berlin!

plot(foreigners_berlin_smooth)
plot(foreigners_berlin_edges)

Conversion

Raster to points

raster_now_points <-
  foreigners_berlin |> 
  terra::as.points()

plot(raster_now_points)

Points to raster

raster_target_layer <- 
  terra::ext(raster_now_points) |> 
  terra::rast(res = 1000)

points_now_raster <- 
  raster_now_points |> 
  terra::rasterize(
    raster_target_layer, 
    field = "cat_0", 
    fun = "mean",
    background = 0
  )

plot(points_now_raster)

Raster to polygons

polygon_raster <-
  foreigners_berlin |>  
  terra::as.polygons() |> 
  sf::st_as_sf()

plot(polygon_raster)